SystemML's Optimizer: Plan Generation for Large-Scale Machine Learning Programs

نویسندگان

  • Matthias Boehm
  • Douglas Burdick
  • Alexandre V. Evfimievski
  • Berthold Reinwald
  • Frederick Reiss
  • Prithviraj Sen
  • Shirish Tatikonda
  • Yuanyuan Tian
چکیده

SystemML enables declarative, large-scale machine learning (ML) via a high-level language with R-like syntax. Data scientists use this language to express their ML algorithms with full flexibility but without the need to hand-tune distributed runtime execution plans and system configurations. These ML programs are dynamically compiled and optimized based on data and cluster characteristics using ruleand cost-based optimization techniques. The compiler automatically generates hybrid runtime execution plans ranging from in-memory, single node execution to distributed MapReduce (MR) computation and data access. This paper describes the SystemML optimizer, its compilation chain, and selected optimization phases for generating efficient execution plans.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Job Shop Scheduling Problem with Sequence-Dependent Setup Times Considering Position-Based Learning Effects and Availability Constraints

 Sequence dependent set-up times scheduling problems (SDSTs), availability constraint and transportation times are interesting and important issues in production management, which are often addressed separately. In this paper, the SDSTs job shop scheduling problem with position-based learning effects, job-dependent transportation times and multiple preventive maintenance activities is studied. ...

متن کامل

Costing Generated Runtime Execution Plans for Large-Scale Machine Learning Programs

Declarative large-scale machine learning (ML) aims at the specification of ML algorithms in a high-level language and automatic generation of hybrid runtime execution plans ranging from single node, in-memory computations to distributed computations on MapReduce (MR) or similar frameworks like Spark. The compilation of large-scale ML programs exhibits many opportunities for automatic optimizati...

متن کامل

Distributed multi-agent Load Frequency Control for a Large-scale Power System Optimized by Grey Wolf Optimizer

This paper aims to design an optimal distributed multi-agent controller for load frequency control and optimal power flow purposes. The controller parameters are optimized using Grey Wolf Optimization (GWO) algorithm. The designed optimal distributed controller is employed for load frequency control in the IEEE 30-bus test system with six generators. The controller of each generator is consider...

متن کامل

Programming abstractions, compilation, and execution techniques for massively parallel data analysis

We are witnessing an explosion in the amount of available data. Today, businesses and scientific institutions have the opportunity to analyze empirical data at unpreceded scale. For many companies, the analysis of their accumulated data is nowadays a key strategic aspect. Today’s analysis programs consist not only of traditional relational-style queries, but they use increasingly more complex d...

متن کامل

Modular Resource Centric Learning for Workflow Performance Prediction

Workflows provide an expressive programming model for fine-grained control of large-scale applications in distributed computing environments. Accurate estimates of complex workflow execution metrics on large-scale machines have several key advantages. The performance of scheduling algorithms that rely on estimates of execution metrics degrades when the accuracy of predicted execution metrics de...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IEEE Data Eng. Bull.

دوره 37  شماره 

صفحات  -

تاریخ انتشار 2014